Skip to content

docs: add deployment guidance for llm fine-tuning examples#1740

Open
gmartini2000 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
gmartini2000:docs/automodel-deployment-guide
Open

docs: add deployment guidance for llm fine-tuning examples#1740
gmartini2000 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
gmartini2000:docs/automodel-deployment-guide

Conversation

@gmartini2000
Copy link
Copy Markdown

What does this PR do ?

Adds a README to the LLM fine-tuning examples directory that provides guidance on how to run recipes, clarifies the deprecation of finetune.py, and outlines next steps after training including basic deployment direction.

Changelog

  • Added examples/llm_finetune/README.md
  • Documented how to launch fine-tuning recipes using the automodel CLI
  • Added note explaining that finetune.py is deprecated
  • Added section describing how to use trained checkpoints after fine-tuning
  • Added high-level deployment guidance and pointers to production workflows

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Signed-off-by: Giulio Martini <martinigiulio02@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@akoumpa akoumpa requested a review from jgerh April 10, 2026 05:37
@akoumpa akoumpa added the docs-only With great power comes great responsibility. label Apr 10, 2026
@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Apr 11, 2026
@akoumpa akoumpa linked an issue Apr 16, 2026 that may be closed by this pull request
@svcnvidia-nemo-ci svcnvidia-nemo-ci added waiting-on-maintainers Waiting on maintainers to respond and removed needs-follow-up Issue needs follow-up labels Apr 21, 2026
Copy link
Copy Markdown
Contributor

@akoumpa akoumpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few inline suggestions on the new README.

@@ -0,0 +1,64 @@
# LLM Fine-Tuning Examples

This directory contains NeMo AutoModel LLM fine-tuning recipes organized by model family. Each subdirectory provides YAML configs for a specific family, such as Llama, Mistral, Qwen, Gemma, Nemotron, and others. The main AutoModel README identifies `examples/llm_finetune/` as the location for LLM fine-tune configs and shows these recipes being launched through the `automodel` CLI.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This directory contains NeMo AutoModel LLM fine-tuning recipes organized by model family. Each subdirectory provides YAML configs for a specific family, such as Llama, Mistral, Qwen, Gemma, Nemotron, and others. The main AutoModel README identifies `examples/llm_finetune/` as the location for LLM fine-tune configs and shows these recipes being launched through the `automodel` CLI.
This directory holds YAML recipes for fine-tuning LLMs with NeMo AutoModel. Each recipe pairs a config (the YAML) with a recipe class (here, `TrainFinetuneRecipeForNextTokenPrediction`); you launch it with the `automodel` CLI.
Pick your path:
| Goal | Recipe variant | Launch |
| ------------------------ | --------------------------------------- | ------------------------------------- |
| Full SFT, single node | `<family>/<model>_<dataset>.yaml` | `automodel <yaml> --nproc-per-node N` |
| LoRA / PEFT, single node | `<family>/<model>_<dataset>_peft.yaml` | same as above |
| Multi-node on SLURM | any of the above | `sbatch` (see *Multi-Node Launches*) |
Subdirectories group recipes by model family (Llama, Mistral, Qwen, Gemma, Nemotron, …).


## Running a Recipe

Set up the environment with `uv`, then launch a recipe with `automodel`:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Set up the environment with `uv`, then launch a recipe with `automodel`:
Recipes are launched through the `automodel` CLI (or its short alias `am`) — both are console scripts wrapping [`nemo_automodel/cli/app.py`](../../nemo_automodel/cli/app.py). For full setup and CLI options, see the [main README](../../README.md#getting-started); for end-to-end examples, see the [LLM SFT](../../README.md#llm-supervised-fine-tuning-sft) and [PEFT](../../README.md#llm-parameter-efficient-fine-tuning-peft) sections. Full reference docs: [docs.nvidia.com/nemo/automodel](https://docs.nvidia.com/nemo/automodel/latest/index.html).
Set up the environment with `uv`, then run a recipe:

automodel examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml --nproc-per-node 8
```

These commands follow the repository's documented setup and launch pattern.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These commands follow the repository's documented setup and launch pattern.

Comment on lines +23 to +31
## Important Note on `finetune.py`

A legacy `finetune.py` entry point exists in this directory, but it is deprecated. The script emits a deprecation warning and explicitly instructs users to launch recipes with:

```bash
automodel <config.yaml> [--nproc-per-node N]
```

So new documentation in this directory should prefer `automodel` over `python finetune.py`. This is also consistent with the main README's documented usage. The inspected script loads a config, constructs `TrainFinetuneRecipeForNextTokenPrediction`, then runs `setup()` followed by `run_train_validation_loop()`, which confirms that these examples are training-entry recipes rather than deployment scripts.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Important Note on `finetune.py`
A legacy `finetune.py` entry point exists in this directory, but it is deprecated. The script emits a deprecation warning and explicitly instructs users to launch recipes with:
```bash
automodel <config.yaml> [--nproc-per-node N]
```
So new documentation in this directory should prefer `automodel` over `python finetune.py`. This is also consistent with the main README's documented usage. The inspected script loads a config, constructs `TrainFinetuneRecipeForNextTokenPrediction`, then runs `setup()` followed by `run_train_validation_loop()`, which confirms that these examples are training-entry recipes rather than deployment scripts.
> [!NOTE]
> A legacy `finetune.py` still exists in this directory but is deprecated — it prints a `DeprecationWarning` and tells you to use `automodel` instead. Do not write new docs or examples around it.

Comment on lines +38 to +39
cp slurm.sub my_cluster.sub
sbatch my_cluster.sub
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slurm.sub is at the repo root, not in this directory.

Suggested change
cp slurm.sub my_cluster.sub
sbatch my_cluster.sub
cp ../../slurm.sub my_cluster.sub # slurm.sub lives at the repo root
# edit my_cluster.sub: --nodes, --partition, container image, mounts, recipe path
sbatch my_cluster.sub

sbatch my_cluster.sub
```

Cluster-specific settings such as nodes, GPUs, partition, container, and mounts should be defined in the sbatch script. NeMo-Run sections are also supported through the cluster guide.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Cluster-specific settings such as nodes, GPUs, partition, container, and mounts should be defined in the sbatch script. NeMo-Run sections are also supported through the cluster guide.
Cluster-specific settings (`--nodes`, `--gpus`, `--partition`, container image, mounts, recipe path) live in the sbatch script. For the NeMo-Run launcher, see [`docs/launcher/slurm.md`](../../docs/launcher/slurm.md).

Comment on lines +48 to +54
## Deployment Guidance

This examples directory does not currently document a single canonical deployment command for all fine-tuned LLM recipes. Based on the materials reviewed here, the safest documented guidance is:

1. **Use the generated checkpoints in your follow-up evaluation or inference workflow.**
2. **Use AutoModel's documented container workflow** when you want a reproducible GPU-backed environment. The contributing guide documents both the AutoModel container path and a custom Docker build path.
3. **Refer to the broader NeMo and AutoModel documentation for production deployment architecture**, rather than assuming a serving/export API directly from these training examples. The repository positions AutoModel as part of the broader NeMo ecosystem for scalable training and deployment-oriented environments.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Deployment Guidance
This examples directory does not currently document a single canonical deployment command for all fine-tuned LLM recipes. Based on the materials reviewed here, the safest documented guidance is:
1. **Use the generated checkpoints in your follow-up evaluation or inference workflow.**
2. **Use AutoModel's documented container workflow** when you want a reproducible GPU-backed environment. The contributing guide documents both the AutoModel container path and a custom Docker build path.
3. **Refer to the broader NeMo and AutoModel documentation for production deployment architecture**, rather than assuming a serving/export API directly from these training examples. The repository positions AutoModel as part of the broader NeMo ecosystem for scalable training and deployment-oriented environments.
## Deployment
These examples are training recipes; this directory does not own a deployment path. See the [main README](../../README.md) and the [NeMo AutoModel docs](https://docs.nvidia.com/nemo/automodel/latest/index.html) for serving and export guidance.

Comment on lines +56 to +64
## Development Notes

If you update documentation here, the contributing guide points contributors to the documentation development guide and requires signed-off commits:

```bash
git commit -s -m "docs: add llm finetune README"
```

Unsigned commits are not accepted. No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Development Notes
If you update documentation here, the contributing guide points contributors to the documentation development guide and requires signed-off commits:
```bash
git commit -s -m "docs: add llm finetune README"
```
Unsigned commits are not accepted.

@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 27, 2026

Hi @gmartini2000 , thanks for making the doc, and I apologize for the delayed response. I think this is a good doc to include as a readme for the recipes folder, I've added some suggestions, please let me know what you think. Thank you.

@svcnvidia-nemo-ci svcnvidia-nemo-ci added waiting-on-customer Waiting on the original author to respond and removed waiting-on-maintainers Waiting on maintainers to respond labels Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request docs-only With great power comes great responsibility. waiting-on-customer Waiting on the original author to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Path for deployment from customization examples

4 participants